Skip to content

Automated Test: embed-url-handling-post #319

Closed

Conversation

admin-coderabbit
Copy link
Owner

@admin-coderabbit admin-coderabbit commented Feb 4, 2026

This pull request was automatically created by @coderabbitai/e2e-reviewer.

Batch created pull request.

Summary by CodeRabbit

  • New Features
    • Enable embedding Discourse discussions on external websites with automatic iframe-based comment integration and synchronization
    • Automatically poll RSS/ATOM feeds at scheduled intervals and import feed content as discussion topics
    • Add configuration settings to define allowed embedding hosts, feed polling sources, and topic authors

@coderabbit-eval
Copy link

coderabbit-eval bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

This pull request introduces embedding functionality, allowing external websites to embed Discourse discussions via iframe, alongside RSS/ATOM feed polling capabilities to automatically create topics from remote feed sources.

Changes

Cohort / File(s) Summary
Dependencies & Configuration
Gemfile, config/site_settings.yml, config/routes.rb
Added ruby-readability and simple-rss gems; introduced four new site settings (embeddable_host, feed_polling_enabled, feed_polling_url, embed_by_username); added /embed/best route.
Localization
config/locales/client.en.yml, config/locales/server.en.yml
Added embedding category label and 6 embed-related locale keys (title, start_discussion, continue, loading, permalink, imported_from) plus site setting descriptions.
Frontend Assets
app/assets/javascripts/embed.js, app/assets/stylesheets/embed.css.scss
New embed.js module with iframe creation and postMessage-based resize handling; new SCSS stylesheet defining layout and typography for embedded comments.
Embed Views
app/views/embed/best.html.erb, app/views/embed/loading.html.erb, app/views/layouts/embed.html.erb
Three new templates: best displays embedded posts with header/footer and logo, loading shows a loading message with auto-refresh, and embed layout handles stylesheet and parent window resize messaging.
Embed Controller & Model
app/controllers/embed_controller.rb, app/models/topic_embed.rb
New EmbedController with best action handling topic retrieval and validation; new TopicEmbed model managing embedded content with import, import_remote, and URL absolutization methods.
Core Model Updates
app/models/post.rb
Added cook_methods enum supporting :regular and :raw_html; raw_html bypasses post cooking pipeline.
Topic Retrieval & Polling
lib/topic_retriever.rb, app/jobs/regular/retrieve_topic.rb, app/jobs/scheduled/poll_feed.rb
New TopicRetriever class orchestrates safe embed retrieval with host validation and Redis-based throttling; RetrieveTopic job handles async retrieval; PollFeed job periodically imports RSS/ATOM feeds into topics.
Library Updates
lib/post_creator.rb, lib/post_revisor.rb, lib/tasks/disqus.thor
PostCreator now respects cook_method option; PostRevisor supports skip_validations; Disqus import task refactored to use TopicEmbed.import_remote instead of PostCreator with category.
Database Migrations
db/migrate/20131210181901_..., db/migrate/20131217174004_create_topic_embeds.rb, db/migrate/20131219203905_add_cook_method_to_posts.rb, db/migrate/20131223171005_...
Created topic_embeds table with unique embed_url index; added cook_method integer column to posts; minor formatting adjustment to word counts migration; added force: true to top_topics table creation.
Test Suite
spec/components/topic_retriever_spec.rb, spec/controllers/embed_controller_spec.rb, spec/jobs/poll_feed_spec.rb, spec/models/topic_embed_spec.rb
Comprehensive specs covering TopicRetriever host/throttle validation, EmbedController request flow and header handling, PollFeed job conditional execution, and TopicEmbed import/validation behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Browser
    participant ExternalSite
    participant DiscourseServer
    participant Redis
    participant DB as Database

    ExternalSite->>Browser: Load page with embed.js
    Browser->>DiscourseServer: Create iframe to /embed/best?embed_url=...
    
    DiscourseServer->>DiscourseServer: ensure_embeddable validates referer
    DiscourseServer->>DB: Check TopicEmbed for embed_url
    alt Topic exists
        DB-->>DiscourseServer: topic_id found
        DiscourseServer->>DB: Fetch TopicView (best posts)
        DB-->>DiscourseServer: posts data
    else Topic doesn't exist
        DB-->>DiscourseServer: no topic_id
        DiscourseServer->>Redis: Check throttle for embed_url
        alt Not throttled
            Redis-->>DiscourseServer: proceed
            DiscourseServer->>DiscourseServer: Queue RetrieveTopic job
        else Recently retrieved
            Redis-->>DiscourseServer: throttled
        end
        DiscourseServer->>Browser: Render loading view
        Browser->>Browser: Auto-reload after 30s
    end
    
    DiscourseServer->>Browser: Render embedded comments
    Browser->>Browser: postMessage listener ready
    Browser-->>ExternalSite: Set iframe height
Loading
sequenceDiagram
    participant Scheduler
    participant PollFeed as PollFeed Job
    participant SimpleRSS
    participant RemoteFeed as Remote Feed
    participant TopicEmbed
    participant DB as Database

    Scheduler->>PollFeed: Execute (hourly)
    PollFeed->>PollFeed: Validate settings (enabled, url, username)
    
    alt Settings incomplete
        PollFeed->>PollFeed: Return early
    else All settings present
        PollFeed->>DB: Resolve embed_by_username to User
        PollFeed->>RemoteFeed: Fetch feed URL
        RemoteFeed-->>SimpleRSS: Feed XML/ATOM
        SimpleRSS-->>PollFeed: Parsed feed items
        
        loop For each feed item
            PollFeed->>PollFeed: Extract URL and content
            PollFeed->>PollFeed: Unescape HTML
            PollFeed->>TopicEmbed: import(user, url, title, content)
            TopicEmbed->>DB: Create or update topic/post
            DB-->>TopicEmbed: Result
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Hops excitedly while nibbling code

A embedding so fine, threads now shine,
RSS feeds hop into the feed—refine!
iframes dance with ALLOWALL grace,
While Redis throttles at proper pace.
From distant sites, voices now convene,
twitches nose The finest embed I've seen! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title 'Automated Test: embed-url-handling-post' does not accurately describe the primary changes in this pull request, which implement a comprehensive embedding feature with feed polling, topic retrieval, and API components. Replace the title with a descriptive summary of the main feature, such as 'Implement feed embedding and topic import functionality' or 'Add embedded comments and RSS feed polling support'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch embed-url-handling-post

Comment @coderabbit eval help to get the list of available commands and usage tips.

@coderabbit-eval coderabbit-eval bot closed this Feb 4, 2026
Copy link

@coderabbit-eval coderabbit-eval bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🤖 Fix all issues with AI agents
In `@app/assets/javascripts/embed.js`:
- Around line 5-12: The code assumes the DOM element with id
'discourse-comments' exists and calls comments.appendChild(iframe), which will
throw if comments is null; update the logic where comments is obtained (variable
comments referencing document.getElementById('discourse-comments')) to guard
against null by checking if comments is truthy before creating/appending the
iframe (or return/exit early), so only create iframe and set
iframe.src/attributes and call comments.appendChild(iframe) when comments is
present.
- Around line 15-24: postMessageReceived currently uses
discourseUrl.indexOf(e.origin) which is vulnerable to host-substring bypasses;
change the origin check to compare origins exactly by using the URL origin
comparison (e.g. compute new URL(discourseUrl).origin and compare it ===
e.origin) or maintain a whitelist array of allowed origins and check e.origin
against that list with strict equality; update the validation in
postMessageReceived to use e.origin === allowedOrigin (or
allowedOrigins.includes(e.origin)) and leave the rest of the message handling
(e.data.type === 'discourse-resize' and iframe.height assignment) unchanged.

In `@app/jobs/scheduled/poll_feed.rb`:
- Around line 24-38: The poll_feed method lacks network/feed error handling and
calls i.content.scrub without nil checks; wrap the feed fetch and parse
(open(SiteSetting.feed_polling_url) and SimpleRSS.parse) in a begin/rescue to
catch and log exceptions, and skip processing on failure, and before calling
.scrub on i.content ensure i.content is present (e.g., use a safe fallback like
an empty string or i.description) so TopicEmbed.import is never called with nil
content; log any errors with context (feed URL and exception) and continue
processing remaining items.

In `@app/models/topic_embed.rb`:
- Line 13: The HTML injection risk comes from interpolating the raw url into the
I18n string in topic_embed.rb; change the construction so the URL is
HTML-escaped before embedding (e.g., use ERB::Util.html_escape or CGI.escapeHTML
on the url) and ensure the href uses a validated/sanitized http(s) URI (parse
and reject non-http(s) schemes) before escaping; update the line that builds
contents (the interpolation of url inside I18n.t('embed.imported_from', link:
...)) to use the escaped/sanitized value so the rendered <a href='...'> and link
text cannot break out or inject XSS.
- Around line 44-53: The import_remote method currently calls open(url) and
Readability::Document without validation or error handling; replace the direct
open call in TopicEmbed.import_remote with a safe fetch: validate and parse the
URL (only allow http/https), resolve the host and reject private/internal IPs,
enforce a max redirect count, use a HTTP client (e.g., Net::HTTP) with explicit
open/read timeouts and a maximum allowed content-length, and limit the HTML size
read; wrap the fetch and Readability::Document creation in a rescue block to
handle timeouts, network errors, invalid content, and oversized responses and
return or raise a controlled error instead of crashing, keeping the call site
TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content) unchanged.

In `@app/views/embed/best.html.erb`:
- Around line 3-5: The two link_to calls that open in a new tab (the ones
calling I18n.t('embed.title') and I18n.t('embed.start_discussion') with target:
'_blank' and using `@topic_view.topic.url`) need a rel attribute to prevent
reverse-tabnabbing; update each link_to invocation to include rel: 'noopener
noreferrer' alongside the existing class and target options.
- Around line 16-17: The avatar <img> lacks alt text; update the tag that uses
post.user.small_avatar_url to include an alt attribute (e.g., using
post.user.username or a fallback like "#{post.user.username} avatar") so screen
readers receive a meaningful description; ensure you use the same
post.user.username symbol for the alt and handle nil/blank username with a
sensible default.

In `@app/views/layouts/embed.html.erb`:
- Around line 9-12: The postMessage call currently uses request.referer as the
targetOrigin which is unreliable and unsafe; update the embed HTML so
parent.postMessage uses a validated origin instead—prefer reading
SiteSetting.embeddable_host (or a derived safeOrigin constant) and fall back to
a safe default only if explicitly configured; ensure you validate/normalize the
origin before passing it to parent.postMessage and avoid using raw
request.referer directly in the targetOrigin parameter (refer to the
parent.postMessage usage and request.referer in embed.html.erb).

In `@config/routes.rb`:
- Around line 245-246: The project is using Rails 3.2.12 (Gemfile entry for
rails) which is vulnerable to CVE-2014-0130; update the Rails dependency in the
Gemfile to a secure version (preferably upgrade the app to Rails 4.x as the
codebase already supports it) then run bundle update to refresh gems, run the
test suite and fix any deprecations/migration issues (routes like the new get
'embed/best' => 'embed#best' may need no change but verify routing and globbing
behavior), and ensure all security-related tests/passive scans are re-run before
merging.

In `@lib/post_revisor.rb`:
- Around line 85-86: In update_post, capture the boolean result of
`@post.save`(validate: !@opts[:skip_validations]) and handle a failure the same
way PostCreator and TopicCreator do: if save returns false (or exception occurs)
raise a runtime error (or log and raise) to prevent subsequent
save_reply_relationships from running and creating inconsistent state; reference
update_post, `@post.save`(validate: !@opts[:skip_validations]), and
save_reply_relationships when making the change so the save outcome is checked
and failures are surfaced consistently.

In `@lib/topic_retriever.rb`:
- Around line 48-50: The fetch_http method calls
SiteSetting.embed_by_username.downcase which will raise if the setting is nil;
update fetch_http to guard the setting before calling downcase (e.g., check
SiteSetting.embed_by_username.blank? or use to_s/safe navigation) and return
early if unset, then proceed to lookup User.where(username_lower: ...) so that
User lookup only runs when embed_by_username is present; reference fetch_http,
SiteSetting.embed_by_username, and User.where to locate the change.
- Around line 27-30: The current two-step throttle uses
$redis.setnx(retrieved_key, "1") followed by $redis.expire(retrieved_key, 60),
which can leave the key permanent on a crash; replace this with a single atomic
Redis SET that uses NX and EX options (i.e., use the Redis client's SET with NX
and a 60s TTL) and remove the separate expire call so the key is created only
when absent and always has the expiration.

In `@spec/jobs/poll_feed_spec.rb`:
- Around line 18-27: Make the specs deterministic by stubbing
SiteSetting.feed_polling_enabled? to true in each example and ensuring only the
intended setting is nil; for the "requires feed_polling_url" test stub
SiteSetting.feed_polling_enabled? => true and stub SiteSetting.embed_by_username
to a valid non-nil value before calling poller.execute, and for the "requires
embed_by_username" test stub SiteSetting.feed_polling_enabled? => true and stub
SiteSetting.feed_polling_url to a valid non-nil value so poller.execute only
fails due to the specifically tested missing setting (references:
poller.execute, poller.expects(:poll_feed).never,
SiteSetting.stubs(:feed_polling_url), SiteSetting.stubs(:embed_by_username),
SiteSetting.stubs(:feed_polling_enabled?)).
🧹 Nitpick comments (10)
app/assets/stylesheets/embed.css.scss (1)

1-69: LGTM with minor suggestion: consider using variables for colors and dimensions.

The stylesheet is well-structured. For maintainability, consider extracting repeated values into SCSS variables:

// At top of file
$embed-border-color: `#ddd`;
$embed-muted-text: `#aaa`;
$embed-author-color: `#4a6b82`;
$embed-author-width: 92px;

This is optional and can be deferred if consistent with existing codebase patterns.

db/migrate/20131223171005_create_top_topics.rb (1)

3-3: Avoid force: true in table creation migrations.

Using force: true in create_table will drop and recreate the table if it already exists. While this pattern was common during Discourse's early development (2013), it's not a Rails best practice and creates unnecessary fragility. In production, migrations only run once and are tracked, so the data loss risk is lower than it might appear, but the pattern should still be avoided in favor of explicit error handling or separate drop/create migrations with appropriate safeguards.

This pattern appears in 11 migrations across the codebase from 2012–2013 and should be addressed as part of a broader migration cleanup effort.

Gemfile (1)

209-211: Pin gem versions to ensure reproducibility and compatibility.

The new gems lack version constraints. Without pinning, future bundle update runs could introduce breaking changes. Specify compatible versions:

 # required for feed importing and embedding
-gem 'ruby-readability', require: false
-gem 'simple-rss', require: false
+gem 'ruby-readability', '~> 0.7', require: false
+gem 'simple-rss', '~> 1.3', require: false

Note: simple-rss hasn't been updated since April 2018—consider verifying it meets your needs before locking to this version.

db/migrate/20131217174004_create_topic_embeds.rb (1)

3-11: Consider adding indexes on topic_id and post_id for query performance.

The table lacks indexes on topic_id and post_id, which may be used for lookups when retrieving embeds by topic or post. Additionally, consider adding foreign key constraints for referential integrity.

♻️ Proposed improvement
     add_index :topic_embeds, :embed_url, unique: true
+    add_index :topic_embeds, :topic_id
+    add_index :topic_embeds, :post_id
   end
 end
app/jobs/regular/retrieve_topic.rb (2)

1-2: Remove unused require_dependency.

The email/sender dependency appears unused in this job. Only topic_retriever is needed.

♻️ Proposed fix
-require_dependency 'email/sender'
 require_dependency 'topic_retriever'

4-22: Minor formatting: Extra empty lines at module boundaries.

Static analysis flagged extra empty lines at lines 5-6 and 21-22. This is a minor convention issue per RuboCop's Layout/EmptyLinesAroundModuleBody rule.

♻️ Proposed fix
 module Jobs
-
   # Asynchronously retrieve a topic from an embedded site
   class RetrieveTopic < Jobs::Base
     # ... (no changes to execute method)
   end
-
 end
app/models/topic_embed.rb (1)

56-76: Port handling may produce incorrect URLs for standard ports.

Line 59 appends the port when it's neither 80 nor 443, but this doesn't account for the scheme. For example, an HTTPS URL on port 80 (non-standard) should include the port, but currently wouldn't. Also, the a and img processing blocks are duplicated.

♻️ Proposed refactor to fix port logic and reduce duplication
   def self.absolutize_urls(url, contents)
     uri = URI(url)
     prefix = "#{uri.scheme}://#{uri.host}"
-    prefix << ":#{uri.port}" if uri.port != 80 && uri.port != 443
+    # Only include port if it differs from the scheme's default
+    default_port = uri.scheme == 'https' ? 443 : 80
+    prefix << ":#{uri.port}" if uri.port != default_port
 
     fragment = Nokogiri::HTML.fragment(contents)
-    fragment.css('a').each do |a|
-      href = a['href']
-      if href.present? && href.start_with?('/')
-        a['href'] = "#{prefix}/#{href.sub(/^\/+/, '')}"
-      end
-    end
-    fragment.css('img').each do |a|
-      src = a['src']
-      if src.present? && src.start_with?('/')
-        a['src'] = "#{prefix}/#{src.sub(/^\/+/, '')}"
+    %w[a img].each do |tag|
+      attr = tag == 'a' ? 'href' : 'src'
+      fragment.css(tag).each do |el|
+        value = el[attr]
+        if value.present? && value.start_with?('/')
+          el[attr] = "#{prefix}/#{value.sub(/^\/+/, '')}"
+        end
       end
     end
 
     fragment.to_html
   end
app/jobs/scheduled/poll_feed.rb (1)

20-22: Unused method: feed_key is defined but never called.

This method generates a cache key but isn't used anywhere in the current implementation. Consider removing it if caching isn't planned, or implement the intended caching logic.

lib/topic_retriever.rb (1)

40-43: Synchronous feed polling may cause request timeouts.

Calling Jobs::PollFeed.new.execute({}) synchronously in the request path could significantly delay the response while the entire feed is fetched and processed. Consider checking if this is truly necessary here or if the feed polling should remain solely as a scheduled background job.

app/views/embed/loading.html.erb (1)

6-11: No reload limit: page will refresh indefinitely until content loads.

The script reloads the page every 30 seconds with no limit. If the content never becomes available (e.g., fetch failure, invalid URL), users will experience an infinite reload loop. Consider adding a maximum retry count or displaying an error after several attempts.

♻️ Proposed fix with retry limit
 <script>
   (function() {
+    var maxRetries = 10;
+    var retryCount = parseInt(sessionStorage.getItem('embedRetryCount') || '0');
+    
+    if (retryCount >= maxRetries) {
+      document.querySelector('header').innerHTML = '<%= t("embed.failed_to_load") %>';
+      sessionStorage.removeItem('embedRetryCount');
+      return;
+    }
+    
+    sessionStorage.setItem('embedRetryCount', retryCount + 1);
     setTimeout(function() {
       document.location.reload();
     }, 30000);
   })();
 </script>

Comment on lines +5 to +12
var comments = document.getElementById('discourse-comments'),
iframe = document.createElement('iframe');
iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl);
iframe.id = 'discourse-embed-frame';
iframe.width = "100%";
iframe.frameBorder = "0";
iframe.scrolling = "no";
comments.appendChild(iframe);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add null check for the target DOM element.

If the discourse-comments element doesn't exist on the page, comments.appendChild(iframe) will throw an error. Consider adding defensive validation.

🛡️ Proposed fix
 var comments = document.getElementById('discourse-comments'),
     iframe = document.createElement('iframe');
+if (!comments) { return; }
 iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var comments = document.getElementById('discourse-comments'),
iframe = document.createElement('iframe');
iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl);
iframe.id = 'discourse-embed-frame';
iframe.width = "100%";
iframe.frameBorder = "0";
iframe.scrolling = "no";
comments.appendChild(iframe);
var comments = document.getElementById('discourse-comments'),
iframe = document.createElement('iframe');
if (!comments) { return; }
iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl);
iframe.id = 'discourse-embed-frame';
iframe.width = "100%";
iframe.frameBorder = "0";
iframe.scrolling = "no";
comments.appendChild(iframe);
🤖 Prompt for AI Agents
In `@app/assets/javascripts/embed.js` around lines 5 - 12, The code assumes the
DOM element with id 'discourse-comments' exists and calls
comments.appendChild(iframe), which will throw if comments is null; update the
logic where comments is obtained (variable comments referencing
document.getElementById('discourse-comments')) to guard against null by checking
if comments is truthy before creating/appending the iframe (or return/exit
early), so only create iframe and set iframe.src/attributes and call
comments.appendChild(iframe) when comments is present.

Comment on lines +15 to +24
function postMessageReceived(e) {
if (!e) { return; }
if (discourseUrl.indexOf(e.origin) === -1) { return; }

if (e.data) {
if (e.data.type === 'discourse-resize' && e.data.height) {
iframe.height = e.data.height + "px";
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Origin validation is vulnerable to bypass attacks.

The indexOf check on line 17 can be bypassed by malicious origins. For example, if discourseUrl is "https://discourse.example.com", an attacker hosting content at "https://discourse.example.com.evil.com" would pass this check since indexOf would return 0.

🔒 Proposed fix for stricter origin validation
 function postMessageReceived(e) {
   if (!e) { return; }
-  if (discourseUrl.indexOf(e.origin) === -1) { return; }
+  // Extract origin from discourseUrl for strict comparison
+  var defined = document.createElement('a');
+  defined.href = discourseUrl;
+  var expectedOrigin = defined.protocol + '//' + defined.host;
+  if (e.origin !== expectedOrigin) { return; }

   if (e.data) {
🤖 Prompt for AI Agents
In `@app/assets/javascripts/embed.js` around lines 15 - 24, postMessageReceived
currently uses discourseUrl.indexOf(e.origin) which is vulnerable to
host-substring bypasses; change the origin check to compare origins exactly by
using the URL origin comparison (e.g. compute new URL(discourseUrl).origin and
compare it === e.origin) or maintain a whitelist array of allowed origins and
check e.origin against that list with strict equality; update the validation in
postMessageReceived to use e.origin === allowedOrigin (or
allowedOrigins.includes(e.origin)) and leave the rest of the message handling
(e.data.type === 'discourse-resize' and iframe.height assignment) unchanged.

Comment on lines +24 to +38
def poll_feed
user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
return if user.blank?

require 'simple-rss'
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)

rss.items.each do |i|
url = i.link
url = i.id if url.blank? || url !~ /^https?\:\/\//

content = CGI.unescapeHTML(i.content.scrub)
TopicEmbed.import(user, url, i.title, content)
end
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing error handling and potential NoMethodError on nil content.

The poll_feed method lacks error handling for network failures and malformed feeds. Additionally, line 35 calls .scrub on i.content which will raise NoMethodError if the feed item has no content field.

🐛 Proposed fix with error handling and nil safety
     def poll_feed
       user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
       return if user.blank?
 
-      require 'simple-rss'
-      rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
-
-      rss.items.each do |i|
-        url = i.link
-        url = i.id if url.blank? || url !~ /^https?\:\/\//
-
-        content = CGI.unescapeHTML(i.content.scrub)
-        TopicEmbed.import(user, url, i.title, content)
+      begin
+        require 'simple-rss'
+        feed_response = URI.open(SiteSetting.feed_polling_url, read_timeout: 30, open_timeout: 10)
+        rss = SimpleRSS.parse(feed_response)
+
+        rss.items.each do |i|
+          url = i.link
+          url = i.id if url.blank? || url !~ /^https?\:\/\//
+
+          raw_content = i.content || i.description || ''
+          content = CGI.unescapeHTML(raw_content.to_s.scrub)
+          TopicEmbed.import(user, url, i.title, content)
+        end
+      rescue StandardError => e
+        Rails.logger.error("PollFeed failed: #{e.message}")
       end
     end
🧰 Tools
🪛 Brakeman (8.0.1)

[medium] 29-29: Model attribute used in file name
Type: File Access
Confidence: Medium
More info: https://brakemanscanner.org/docs/warning_types/file_access/

(File Access)

🤖 Prompt for AI Agents
In `@app/jobs/scheduled/poll_feed.rb` around lines 24 - 38, The poll_feed method
lacks network/feed error handling and calls i.content.scrub without nil checks;
wrap the feed fetch and parse (open(SiteSetting.feed_polling_url) and
SimpleRSS.parse) in a begin/rescue to catch and log exceptions, and skip
processing on failure, and before calling .scrub on i.content ensure i.content
is present (e.g., use a safe fallback like an empty string or i.description) so
TopicEmbed.import is never called with nil content; log any errors with context
(feed URL and exception) and continue processing remaining items.

def self.import(user, url, title, contents)
return unless url =~ /^https?\:\/\//

contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential XSS: URL is interpolated into HTML without escaping.

The url parameter is directly embedded into the HTML string. While validated against an HTTP(S) regex, URLs can still contain characters that break out of the href attribute or introduce XSS payloads (e.g., javascript: protocol after redirects, or characters like " and >).

🛡️ Proposed fix using ERB::Util for escaping
+  require 'erb'
+
   # Import an article from a source (RSS/Atom/Other)
   def self.import(user, url, title, contents)
     return unless url =~ /^https?\:\/\//
 
-    contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"
+    escaped_url = ERB::Util.html_escape(url)
+    contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{escaped_url}'>#{escaped_url}</a>")}</small>\n"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"
require 'erb'
# Import an article from a source (RSS/Atom/Other)
def self.import(user, url, title, contents)
return unless url =~ /^https?\:\/\//
escaped_url = ERB::Util.html_escape(url)
contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{escaped_url}'>#{escaped_url}</a>")}</small>\n"
end
🤖 Prompt for AI Agents
In `@app/models/topic_embed.rb` at line 13, The HTML injection risk comes from
interpolating the raw url into the I18n string in topic_embed.rb; change the
construction so the URL is HTML-escaped before embedding (e.g., use
ERB::Util.html_escape or CGI.escapeHTML on the url) and ensure the href uses a
validated/sanitized http(s) URI (parse and reject non-http(s) schemes) before
escaping; update the line that builds contents (the interpolation of url inside
I18n.t('embed.imported_from', link: ...)) to use the escaped/sanitized value so
the rendered <a href='...'> and link text cannot break out or inject XSS.

Comment on lines +44 to +53
def self.import_remote(user, url, opts=nil)
require 'ruby-readability'

opts = opts || {}
doc = Readability::Document.new(open(url).read,
tags: %w[div p code pre h1 h2 h3 b em i strong a img],
attributes: %w[href src])

TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content)
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Security: SSRF vulnerability and missing error handling in import_remote.

Using open(url) directly with user-controlled URLs exposes the application to Server-Side Request Forgery (SSRF) attacks. An attacker could potentially access internal network resources. Additionally, there's no timeout, error handling, or content-length limit which could lead to resource exhaustion.

🔒 Proposed fix with validation and error handling
   def self.import_remote(user, url, opts=nil)
     require 'ruby-readability'
 
     opts = opts || {}
+    
+    begin
+      uri = URI.parse(url)
+      raise "Invalid URL scheme" unless %w[http https].include?(uri.scheme)
+      
+      # Add timeout and redirect limits
+      response = uri.open(read_timeout: 30, open_timeout: 10)
+      html_content = response.read(1.megabyte) # Limit content size
+    rescue StandardError => e
+      Rails.logger.error("Failed to fetch embed URL #{url}: #{e.message}")
+      return nil
+    end
+    
-    doc = Readability::Document.new(open(url).read,
+    doc = Readability::Document.new(html_content,
                                         tags: %w[div p code pre h1 h2 h3 b em i strong a img],
                                         attributes: %w[href src])
 
     TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content)
   end
🤖 Prompt for AI Agents
In `@app/models/topic_embed.rb` around lines 44 - 53, The import_remote method
currently calls open(url) and Readability::Document without validation or error
handling; replace the direct open call in TopicEmbed.import_remote with a safe
fetch: validate and parse the URL (only allow http/https), resolve the host and
reject private/internal IPs, enforce a max redirect count, use a HTTP client
(e.g., Net::HTTP) with explicit open/read timeouts and a maximum allowed
content-length, and limit the HTML size read; wrap the fetch and
Readability::Document creation in a rescue block to handle timeouts, network
errors, invalid content, and oversized responses and return or raise a
controlled error instead of crashing, keeping the call site
TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content) unchanged.

Comment on lines +245 to +246
get 'embed/best' => 'embed#best'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Route addition looks good, but Rails version has critical vulnerability.

The new embed route follows existing conventions. However, static analysis flagged that Rails 3.2.12 (in Gemfile line 59) is vulnerable to CVE-2014-0130—a directory traversal and remote code execution vulnerability affecting globbing routes. This is a critical security issue that should be prioritized.

🔒 Recommended fix: Upgrade Rails

In the Gemfile (Rails 3.x branch):

-  gem 'rails', '3.2.12'
+  gem 'rails', '~> 3.2.22'

Or better yet, migrate to Rails 4.x which is already supported in this codebase.

🧰 Tools
🪛 Brakeman (8.0.1)

[medium] Rails 3.2.12 with globbing routes is vulnerable to directory traversal and remote code execution. Patch or upgrade to Rails 3.2.18
Type: Remote Code Execution
Confidence: Medium
More info: http://matasano.com/research/AnatomyOfRailsVuln-CVE-2014-0130.pdf

(Remote Code Execution)

🤖 Prompt for AI Agents
In `@config/routes.rb` around lines 245 - 246, The project is using Rails 3.2.12
(Gemfile entry for rails) which is vulnerable to CVE-2014-0130; update the Rails
dependency in the Gemfile to a secure version (preferably upgrade the app to
Rails 4.x as the codebase already supports it) then run bundle update to refresh
gems, run the test suite and fix any deprecations/migration issues (routes like
the new get 'embed/best' => 'embed#best' may need no change but verify routing
and globbing behavior), and ensure all security-related tests/passive scans are
re-run before merging.

Comment on lines +85 to +86
@post.save(validate: !@opts[:skip_validations])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the actual code at lines 85-86 and surrounding context
cd / && find . -name "post_revisor.rb" -type f 2>/dev/null | head -5

Repository: admin-coderabbit/coderabbit-discourse-mirror

Length of output: 127


🏁 Script executed:

# Check the file structure to find the correct path
git ls-files | grep -i post_revisor

Repository: admin-coderabbit/coderabbit-discourse-mirror

Length of output: 144


🏁 Script executed:

# View the code around lines 85-86
cat -n lib/post_revisor.rb | sed -n '75,100p'

Repository: admin-coderabbit/coderabbit-discourse-mirror

Length of output: 1077


🏁 Script executed:

# See the method name containing this code
cat -n lib/post_revisor.rb | sed -n '60,90p'

Repository: admin-coderabbit/coderabbit-discourse-mirror

Length of output: 1299


🏁 Script executed:

# Find all uses of skip_validations in the codebase
rg -n 'skip_validations' --type ruby

Repository: admin-coderabbit/coderabbit-discourse-mirror

Length of output: 1541


🏁 Script executed:

# Check the save_reply_relationships method implementation
rg -A 10 'def save_reply_relationships' lib/post_revisor.rb

Repository: admin-coderabbit/coderabbit-discourse-mirror

Length of output: 70


🏁 Script executed:

# Find save_reply_relationships - likely in Post model
rg -B 2 -A 15 'def save_reply_relationships' --type ruby

Repository: admin-coderabbit/coderabbit-discourse-mirror

Length of output: 953


🏁 Script executed:

# Look at how PostCreator handles the save return value
cat -n lib/post_creator.rb | sed -n '225,240p'

Repository: admin-coderabbit/coderabbit-discourse-mirror

Length of output: 525


🏁 Script executed:

# Look at how TopicCreator handles the save return value
cat -n lib/topic_creator.rb | sed -n '70,85p'

Repository: admin-coderabbit/coderabbit-discourse-mirror

Length of output: 591


🏁 Script executed:

# Check the revise! method in post_revisor to see if it handles errors
rg -B 5 -A 30 'def revise!' lib/post_revisor.rb | head -60

Repository: admin-coderabbit/coderabbit-discourse-mirror

Length of output: 700


🏁 Script executed:

# Check if revise! method does any error handling or transaction management
cat -n lib/post_revisor.rb | sed -n '1,50p'

Repository: admin-coderabbit/coderabbit-discourse-mirror

Length of output: 1416


🏁 Script executed:

# Check the context where skip_validations is used with revise!
cat -n app/models/topic_embed.rb | sed -n '30,45p'

Repository: admin-coderabbit/coderabbit-discourse-mirror

Length of output: 647


🏁 Script executed:

# Check if there's any transaction/error handling around revise! calls
rg -B 5 -A 5 'revise!' lib/ app/ --type ruby | grep -A 10 -B 10 'skip_validations'

Repository: admin-coderabbit/coderabbit-discourse-mirror

Length of output: 1055


Handle failed saves when skip_validations is enabled.

In the update_post method, @post.save(validate: !@opts[:skip_validations]) doesn't capture the return value, so if save fails and skip_validations is true, the failure goes unnoticed. Line 87's save_reply_relationships still executes, potentially creating inconsistent state. This is inconsistent with how PostCreator (line 232) and TopicCreator (line 78) handle the same pattern—both check the return value and raise exceptions on failure.

Consider either raising an exception (consistent with PostCreator/TopicCreator pattern) or logging the failure:

Suggested fix
-    `@post.save`(validate: !@opts[:skip_validations])
-
+    unless `@post.save`(validate: !@opts[:skip_validations])
+      Rails.logger.warn("PostRevisor: Failed to save post #{`@post.id`}: #{`@post.errors.full_messages.join`(', ')}")
+    end
     `@post.save_reply_relationships`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@post.save(validate: !@opts[:skip_validations])
unless `@post.save`(validate: !@opts[:skip_validations])
Rails.logger.warn("PostRevisor: Failed to save post #{`@post.id`}: #{`@post.errors.full_messages.join`(', ')}")
end
🤖 Prompt for AI Agents
In `@lib/post_revisor.rb` around lines 85 - 86, In update_post, capture the
boolean result of `@post.save`(validate: !@opts[:skip_validations]) and handle a
failure the same way PostCreator and TopicCreator do: if save returns false (or
exception occurs) raise a runtime error (or log and raise) to prevent subsequent
save_reply_relationships from running and creating inconsistent state; reference
update_post, `@post.save`(validate: !@opts[:skip_validations]), and
save_reply_relationships when making the change so the save outcome is checked
and failures are surfaced consistently.

Comment on lines +27 to +30
if $redis.setnx(retrieved_key, "1")
$redis.expire(retrieved_key, 60)
return false
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Race condition in Redis throttling: key may persist indefinitely.

If the process crashes between setnx (line 27) and expire (line 28), the key will never expire and will block all future retrievals for that URL. Use SET with EX and NX options atomically instead.

🐛 Proposed fix using atomic SET with EX and NX
     def retrieved_recently?
       # We can disable the throttle for some users, such as staff
       return false if `@opts`[:no_throttle]
 
       # Throttle other users to once every 60 seconds
       retrieved_key = "retrieved:#{`@embed_url`}"
-      if $redis.setnx(retrieved_key, "1")
-        $redis.expire(retrieved_key, 60)
-        return false
-      end
-
-      true
+      # SET with NX and EX is atomic - prevents key from persisting if process crashes
+      !$redis.set(retrieved_key, "1", ex: 60, nx: true)
     end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if $redis.setnx(retrieved_key, "1")
$redis.expire(retrieved_key, 60)
return false
end
def retrieved_recently?
# We can disable the throttle for some users, such as staff
return false if `@opts`[:no_throttle]
# Throttle other users to once every 60 seconds
retrieved_key = "retrieved:#{`@embed_url`}"
# SET with NX and EX is atomic - prevents key from persisting if process crashes
!$redis.set(retrieved_key, "1", ex: 60, nx: true)
end
🤖 Prompt for AI Agents
In `@lib/topic_retriever.rb` around lines 27 - 30, The current two-step throttle
uses $redis.setnx(retrieved_key, "1") followed by $redis.expire(retrieved_key,
60), which can leave the key permanent on a crash; replace this with a single
atomic Redis SET that uses NX and EX options (i.e., use the Redis client's SET
with NX and a 60s TTL) and remove the separate expire call so the key is created
only when absent and always has the expiration.

Comment on lines +48 to +50
def fetch_http
user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
return if user.blank?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential NoMethodError if embed_by_username is nil.

Line 49 calls .downcase on SiteSetting.embed_by_username without a nil check. If the setting is unconfigured, this will raise NoMethodError.

🛡️ Proposed fix with safe navigation
     def fetch_http
-      user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
+      username = SiteSetting.embed_by_username
+      return if username.blank?
+      
+      user = User.where(username_lower: username.downcase).first
       return if user.blank?
 
       TopicEmbed.import_remote(user, `@embed_url`)
     end
🤖 Prompt for AI Agents
In `@lib/topic_retriever.rb` around lines 48 - 50, The fetch_http method calls
SiteSetting.embed_by_username.downcase which will raise if the setting is nil;
update fetch_http to guard the setting before calling downcase (e.g., check
SiteSetting.embed_by_username.blank? or use to_s/safe navigation) and return
early if unset, then proceed to lookup User.where(username_lower: ...) so that
User lookup only runs when embed_by_username is present; reference fetch_http,
SiteSetting.embed_by_username, and User.where to locate the change.

Comment on lines +18 to +27
it "requires feed_polling_url" do
SiteSetting.stubs(:feed_polling_url).returns(nil)
poller.expects(:poll_feed).never
poller.execute({})
end

it "requires embed_by_username" do
SiteSetting.stubs(:embed_by_username).returns(nil)
poller.expects(:poll_feed).never
poller.execute({})

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make the guard‑condition specs deterministic.

These examples can pass without exercising the intended guard if SiteSetting.feed_polling_enabled? is false by default. Stub all required settings so each test isolates exactly one missing requirement.

🛠️ Suggested update
 it "requires feed_polling_url" do
+  SiteSetting.stubs(:feed_polling_enabled?).returns(true)
   SiteSetting.stubs(:feed_polling_url).returns(nil)
+  SiteSetting.stubs(:embed_by_username).returns(embed_by_username)
   poller.expects(:poll_feed).never
   poller.execute({})
 end

 it "requires embed_by_username" do
+  SiteSetting.stubs(:feed_polling_enabled?).returns(true)
+  SiteSetting.stubs(:feed_polling_url).returns(url)
   SiteSetting.stubs(:embed_by_username).returns(nil)
   poller.expects(:poll_feed).never
   poller.execute({})
 end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
it "requires feed_polling_url" do
SiteSetting.stubs(:feed_polling_url).returns(nil)
poller.expects(:poll_feed).never
poller.execute({})
end
it "requires embed_by_username" do
SiteSetting.stubs(:embed_by_username).returns(nil)
poller.expects(:poll_feed).never
poller.execute({})
it "requires feed_polling_url" do
SiteSetting.stubs(:feed_polling_enabled?).returns(true)
SiteSetting.stubs(:feed_polling_url).returns(nil)
SiteSetting.stubs(:embed_by_username).returns(embed_by_username)
poller.expects(:poll_feed).never
poller.execute({})
end
it "requires embed_by_username" do
SiteSetting.stubs(:feed_polling_enabled?).returns(true)
SiteSetting.stubs(:feed_polling_url).returns(url)
SiteSetting.stubs(:embed_by_username).returns(nil)
poller.expects(:poll_feed).never
poller.execute({})
end
🤖 Prompt for AI Agents
In `@spec/jobs/poll_feed_spec.rb` around lines 18 - 27, Make the specs
deterministic by stubbing SiteSetting.feed_polling_enabled? to true in each
example and ensuring only the intended setting is nil; for the "requires
feed_polling_url" test stub SiteSetting.feed_polling_enabled? => true and stub
SiteSetting.embed_by_username to a valid non-nil value before calling
poller.execute, and for the "requires embed_by_username" test stub
SiteSetting.feed_polling_enabled? => true and stub SiteSetting.feed_polling_url
to a valid non-nil value so poller.execute only fails due to the specifically
tested missing setting (references: poller.execute,
poller.expects(:poll_feed).never, SiteSetting.stubs(:feed_polling_url),
SiteSetting.stubs(:embed_by_username),
SiteSetting.stubs(:feed_polling_enabled?)).

Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant